24 



CG 004 524 



ED 032 591 

By 'Love. William A-. Jr.; Stewart. Douglas K- 
Interpreting Canonical Correlations: Theory and Practice- Interim Report 7. Project No- 3051. 

American Institutes for Research. Palo Alto. Calif.; Pittsburgh Univ.. Pa- School of Education. 

Spons Agency -Office of Education (DHEW). Washington. DO- Bureau of Research. 

Bureau No * BR '5 0606 
Pub Date 68 
Contract - OEC -6 - 1 0 -065 
Note-74p. 

Available from Project TALENT. American Institutes for Research. Post Office Box 1113. Palo Alto. California 
94302 ($300 per Copy. Postpaid). 

EDRS Price MF-S050 HC $380 

Descriptors ' Adolescents , •Correlation. Data Analysis. •Sociometric Techniques. •Statistical Analysis. Status. 
Values 

The interpretation of canonical correlations presents some problems. The 
problem to which the first section of this monograph is addressed is the 
interpretation of the canonical solution. The authors suggest a summary for 
determining the proportion of variance of one set predicted by another set (R). The 
relative contributions of variables to the general index have therefore been 
proposed as an indication of the relative importance of the variables to the canonical 
solution. The second section of the monograph attempts to establish a description of 
the value system utilized by contemporary adolescents in assigning status to other 
members of their sub-culture. The sample consisted of 12th graders in seven schools. 
Sociograms were constructed giving student s choices and rejections of males and 
females separately. Findings, concerning values and status are presented. The 
authors feel that the technique used was valuable. By utilizing the canonical solution 
with R. and the proportioned squared multiple correlations, one can look at the way 
two sets of variables are related in multiple populations, and then to select the more 
important variables for further study. (SJ) 



EDO 32591 



H£ S'Cie>(. */*7 

f* > / 

6 e-rt# 



Projec t TALENT 



INTERPRETING CANONICAL 
CORRE LATIONS 
THEORY AND PRACTICE 



William A. Love. Jr. 
and 

Douglas K. Stewart 



American Institutes for Research 

and 

School of Education, University of Pittsburgh 



1968 



Major Project TALENT Publications 

» • * 

Flanagan, ’J. C., Dailey, J. T., Shaycoft t Marion F. , Gorham, W. A., Orr, D. B., & 
Goldberg, I. Designing the study . (Technical report to the U. S. Office of Education 

Cooperative Research Project No. 635.) Washington, D. C. : Project TALENT Office * 

Univer. of Pittsburgh, 1960. * 

Flanagan, J. C., Dailey, J. T., Shaycoft, Marion F. , Gorham, W. A., Orr, D. B., & 
Goldberg, I. The talents of American youth . Vol. 1. Design for a study of American 
youth . Boston: Houghton Mifflin, 1962. 

Flanagan, J. C., Dailey, J. T., Shaycoft, Marion F. , Orr, D. B., & Goldberg, I. 

Studies of the American high school . (Final report to the U. S. Office of Education 
Cooperative Research Project No. 226.) Washington, D. C. : Project TALENT Office * 
Univer. of Pittsburgh, 1962. * 

Shaycoft, Marion F. , Dailey, J. T., Orr, D. B., Neyman, C. A., Jr., & Sherman, S. E. 
Studzes of a complete age group - Age 15 . (Final report to the U. S. Office of 

Education, Cooperative Research Project No. 635.) Pittsburgh: Project TALENT Office 

Univer. of Pittsburgh, 1963. * 

Flanagan, J. C., Davis, F. B., Dailey, J. T., Shaycoft, Marion F., Orr, D. B., 

Goldberg, I.., & Neyman, C. A., Jr. The American high-school student . (Final report to 
the U. S. Office of Education, Cooperative Research Project No. 635.) Pittsburgh: 

Project TALENT Office, Univer. of Pittsburgh, 1964. 

Flanagan, J. C., Cooley, W. W., Lohnes, P. R. , Schoenfeldt, L. F., Holdeman, R. W. , 

Combs, Janet, & Becker, Susan J. Project TALENT one-year follow-up studies . (Final 
report to the U. S. Office of Education, Cooperative Reseach Project No. 2333.) 
Pittsburgh: Project TALENT Office, Univer. of Pittsburgh, 1966. 

Lohnes, P. R. Measuring adolescent personality . (Interim report 1 to the U. S. Office 

of Education, Cooperative Research Project No. 3051.) Pittsburgh: Project TALENT 

Office, Univer. of Pittsburgh, 1966. 

Hall, C. E. Three papers in multivariate analysis . (Interim report 2 to the U. S. 

Office of Education, Cooperative Research Project No. 3051.) Pittsburgh: Project 

TALENT Office, American Institutes for Research and Univer. of Pittsburgh, 1967. 

Shaycoft, Marion F. The high school years: Growth in cognitive skills . (Interim report 

3 to the U. S. Office of Education, Cooperative Research Project No. 3051.) Pittsburgh: 
Project TALENT Office, American Institutes for Research and Univer. of Pittsburgh, 1967. 

Cureton, E. E. A factor analysis of Project TALENT tests and four other test 
batteries . (Interim report 4 to the U. S. Office of Education, Cooperative Research 

Project No. 3051.) Palo Alto: Project TALENT Office, American Institutes for Research 

and Univer. of Pittsburgh, 1968. 

Cooley, W. W. , & Lohnes, P. R. Predicting development of young adults . (Interim 
report 5 to the U. S. Office of Education, Cooperative Research Project No. 3051.) 

Palo Alto: Project TALENT Office, American Institutes for Research and Univer. of , 

Pittsburgh, 1968. 

Kapel, D. E. Effects of Negro density on student variables and the post-high- school 
adjustment of male Negroes . (Interim report 6 to the U. S. Office of Education, 
Cooperative Research Project No. 3051.) Palo Alto: Project TALENT Office, American 

Institutes for Research and Univer. of Pittsburgh, 1968. 

Love, W. A., Jr., & Stewart, D. K. Interpreting canonical correlations: Theory and 
practice • (Interim report 7 to the U. S. Office of Education, Cooperative Research 

Project No. 3051.) Palo Alto: Project TALENT Office, American Institutes for Research 

and Univer. of Pittsburgh, 1968. 



C<3 oov say edo 32591 



J 



«.$. MMMHMI OF HUUH. EDUCATION S Wfltttt 
OFFICE OF EDUCATION 



THIS DOCUMENT HAS BEEN REPRODUCED EXACTLY AS RECEIVED FROM THE 
PERSON OR ORGANIZATION ORIGINATING IT. POINTS OF VIEW OR OPINIOI 
STATED DO NOT NECESSARILY REPRESENT OFFICIAL OFFICE OF EDUCATION 
POSITION OR POLICY. 



INTERPRETING CANONICAL CORRELATIONS: 



THEORY AND PRACTICE 



William A. Love, Jr. 
Post-Doctoral Fellow (1966-67) 
Project TALENT 

American Institutes for Research 

and 

Nova University 
and 



Douglas K. Stewart 
Post-Doctoral Fellow (1965-67) 
Project TALENT 

American Institutes for Research 

and 

University of Pittsburgh 



Interim Report 7 
Project No. 3051 
Contract No. OE-6-10-065 



The research reported herein was performed pursuant 
to a contract with the Office of Education, U.S. 
Department of Health, Education, and Welfare. Con- 
tractors undertaking such projects under Government 
sponsorship are encouraged to express freely their 
professional judgment in the conduct of the project. 
Points of view or opinions stated do not, therefore, 
necessarily represent official Office of Education 
position or policy. 



hKLL 



American Institutes for Research 



and 



School of Education, University of Pittsburgh 



1968 



PREFACE 



In September of 1966, the authors came to Project TALENT for a 
nine month program of study in the areas of methodology and computer 
applications. This period of study was funded by the Office of 
Education Post-Doctoral Fellowships, (O.E.G., 1-6-062084-1789). During 
our stay at TALENT, we had the opportunity of working with and being 
guided by many outstanding teachers. We are particularly grateful to 
W. W. Cooley, who was director of Project TALENT and the man directly 
responsible for bringing us there; Marion Shaycoft (associate 
director), Charles Hall (director of school studies), Bary Wingersky 
(director of computer systems), and Paul Lohnes (director of guidance 
studies) . 

While we would like to express our thanks to all these people, 
we would especially like to thank Dr. Lohnes for his guidance in the 
work which resulted in this monograph. As a result of his multi- 
variate seminar in the fall and winter of 1966—67 , we became interested 
in canonical correlation. The problem of redundancy across batteries 
came directly out of questions raised by Dr. Lohnes in this seminar. 

We would also like to express our special thanks to Bary Wingersky for 
deriving a set of formal proofs which undergird the work presented here. 
The authors freely admit to being "empirical statisticians," by which 
we mean, that when a given procedure seems to be the common-sense 
way to do it, we try it out with data to see how the results look. At 
least one member of the Project TALENT staff has commented that we 
proceed with a combination of platonic logic and analogy. As satisfying 



iv 



as this has been to the authors, we are aware that it causes some 
heightened eyebrows in the academic community. As a result, we turned 
to Mr. Wingersky for his help and he furnished us with a set of formal 
proofs, which seemed to indicate that the work presented here has some 
mathematical underpinning. 

The order of presentation of topics represents the evolution of 
our thinking. When the problem of assessing the amount of information 
in one battery of tests first came up, we began to work on this problem, 
and when we developed what we felt was a solution, we drafted a paper 
titled, "A General Canonical Correlation Index," (a revised form of which 
has since been accepted for publication by P sychological Bulletin ) . As 
we worked on this paper, we came to have the strong feeling that the 
multiple regression systems underlying the two batteries was related to 
the canonical solution. We also became convinced that there was a 
better way to interpret the contribution of variables to the canonical 
solution than by looking at the canonical weights and loadings . As we 
began to work on this problem, we developed our second paper, which has 
also been submitted for publication under the title, "Assessing the 
Relative Importance of Variables in the Canonical Solution." In the 
process of developing this second paper, we concluded that the multiple 
regressions of one battery upon the other was the key, and that there was 
a simple method of computing the cross-multiple correlations given the 
information that is calculated in the computer program we used. A short 
paper demonstrating the algorithm used to compute the cross-multiple 
correlations based on the loadings of the variables on the canonical 
variates has been accepted for publication by Behavioral Science . 



V 



The material covered in these papers has been combined in the 
initial section of this monograph entitled, "Interpreting Canonical 
Correlation." After spending several months completely immersed in 
this particular technique, the authors were anxious to try it out on 
some data of their own. The particular area of application of socio- 
metric data is an area with which we are substantively concerned. 

We would like to thank everyone at Project TALENT, from the director 
to the secretarial staff, who made this research possible. We are 
aware that we absorbed an inordinate amount of everyone’s time and we 
are grateful for the cheerful help and guidance that was given us. The 
actual data analysis for this monograph utilized the set of programs 
developed by Dr. Paul Lohnes for the forthcoming second edition of the 
Cooley and Lohnes book, "Multivariate Procedures for the Behavioral 

Sciences." 




Vll 



TABLE OF CONTENTS 



Preface 

List of Tables# • •#.... 

Chapter I. INTERPRETING CANONICAL CORRELATION 

Chapter II# ABILITIES, MOTIVES, SEX, AND SOCIOMETRIC STATUS. 
The Sample 

Sociometric Data 

Measures of Abilities and Motives 

The Relation Among the Different Sociometric Indices . . . . 
References 

Appendix 



Page 

iii 

ix 

1 

17 

18 
20 
24 
28 
57 
59 




ix 



LIST OF TABLES 



Chapter I 

Table 1 Factor Structure 

Table 2 Components of Redundancy Measure 

Table 3 Canonical Loadings and Correlations 

2 

Table 4 Proportioned R 



Chapter II 
Table 1 
Table 2 

Table 3 

Table 4 

Table 5 

Table 6 

Table 7 

Table 8 

Table 9 



Correlations between Sociometric Indices 

Canonical Correlations between Male Ability 
Factors and Sociometric Status 

Canonical Correlations between Female Ability 
Factors and Sociometric Status 

Canonical Correlations between Male Motives 
Factors and Sociometric Status 

Canonical Correlations between Female Motives 
Factors and Sociometric Status 

Canonical Correlations between Male Sociometric 
Status as Assigned by Females and Male 
Ability Variables 

Canonical Correlations between Female 

Sociometric Status as Assigned by Males 
and Female Ability Variables 

Canonical Correlations between Male Sociometric 
Status as Assigned by Females and Male Motive 
Factors 

Canonical Correlations between Female 

Sociometric Status as Assigned by Males and 
Female Motive Factors 



Appendix A 

Table A Abilities Domain 

Table B Motives Domain 




Page 



6 

7 

13 

14 

30 

32 

35 

41 

42 

47 

48 

50 

51 

63 

66 



Chapter I 



INTERPRETING CANONICAL CORRELATION 



The interpretation of canonical correlations presents some problems. 
Whereas a squared multiple correlation represents the proportion of 
criterion variance predicted by the optimal linear combination of pre- 
dictors, a squared canonical correlation represents the variance shared 
by linear composites of two sets of variables, and not the shared variance 
of the two set§. 

Unfortunately, therefore, canonical correlations cannot be 
interpreted as correlations between sets of variables. It is important 
to note, that a relatively strong canonical correlation may obtain 
between two linear functions, even though these linear functions may not 
extract significant portions of variance from their respective batteries. 
This is the problem of interpretation to which this paper is addressed. 

Rozeboom (1965) has suggested the relevance of information 
theoretic concepts in dealing with canonical correlations. Uncertainty 
and alienation are considered parallel, and similarly, redundancy and 
correlation are treated as analogous. Given this approach, Rozeboom 
develops a general index which is similar to one presented by Anderson 
(1958, p.244). Both measures are symmetric, i.e., given two sets of 
variables, one number is presented which presents the magnitude of 
their intersection. A directional or non-symmetric index is possible 
by pursuing the information theoretic analogies suggested by Rozeboom. 

In addition to the primitive concept of uncertainty (or entropy) 



2 



Shannon (Shannon and Weaver, 1949) discusses conditional uncertainty. 
Similarly, one may discuss the complement of conditional uncertainty 
as conditional redundancy, a non-symmetric measure is considered de- 
sirable because one set of variables may be almost completely subsumed 
by a larger set, i.e., redundancy can be represented as the intersection 
of two sets of variables, and it is desirable to represent the proportion 

of one set which is in the intersection (see Fig. 1). 

FIGURE I 




In the case pictured in Figure 1, it is clear that most of set A is 
contained in set B, whereas a relatively large portion of set B is outside 
the intersection. This paper proposes an index based on canonical 
correlation which is non-symmetric and has been worthwhile in the 
analysis of various partitioned matrices. 

If je were to factor analyze two sets of variables independently 
and then develop weights which would rotate the two factor structures 
to maximum correlation, we would have a canonical solution (Hotelling, 
1935) . In the canonical case the factors are usually referred to as 
canonical variates. The correlation between the first factor of the 
left set .id the first factor of the right set is the first canonical 




Chapter I 



INTERPRETING CANONICAL CORRELATION 



The interpretation of canonical correlations presents some problems. 
Whereas a squared multiple correlation represents the proportion of 
criterion variance predicted by the optimal linear combination of pre- 
dictors, a squared canonical correlation represents the variance shared 

by linear composites of two s^ts of variables, and not the shared variance 
of the two set§. 

Unfortunately, therefore, canonical correlations cannot be 
interpreted as correlations between sets of variables. It is important 
to note, that a relatively strong canonical correlation may obtain 
between two linear functions, even though these linear functions may not 
extract significant portions of variance from their respective batteries. 
This is the problem of interpretation to which this paper is addressed. 

Rozebooti (1965) has suggested the relevance of information 
theoretic concepts in dealing with canonical correlations. Uncertainty 
and alienation are considered parallel, and similarly, redundancy and 
correlation are treated as analogous. Given this approach, Rozeboom 
develops a general index which is similar to one presented by Anderson 
(1958, p.244). Both measures are symmetric, i.e., given t o sets of 
variables, one number is presented which presents the magnitude of 
their intersection. A directional or non-symmetric index is possible 
by pursu..ng the information theoretic analogies suggested by Rozeboom. 

In addit Lon to the primitive concept of uncertainty (or entropy) 



3 



correlation 



R 



In otdei to take advantage of the well developed 



language of factor analysis, ve shall call them canonical factors. 

Since the complete factor structure of a set of variables will 
contain as many factors as there are variables, it is obvious that if 
the larger set is composed of five variables and the smaller set of 
three variables, only three factors can be extracted from the smaller 
set. As a result, R^/s are available between three of the factors of 
the larger set and the three factors of the smaller set. The remaining 



two factors in the larger set have no counterpart in the smaller set 
and do not enter into the canonical solution. 

In the traditional interpretation of canonical correlations, 
the magnitude of the R c *s, whether or not they are significantly non- 
zero, and the weights used to obtain the R c *s are considered (Cooley 
and Lohnes, 1962). The interpretation of these weights has all the 
problems attendant to the beta weights of common multiple regression. 



At the suggestion of Meredith (1964), some investigators now compute 
the correlations between the variables in a set and the canonical 

2 

factors of that set (the factor loadings of factor analytic parlance) . 

Before we consider a method of calculating an index of redundancy 
we should agree on vocabulary. We need one index for the redundancy in 
the left set given the right and another index for the reverse relation. 
For the sake of simplicity, we will consider one set of variables as the 



predictor or conditioning set and the other set as the criterion, as in 



^This is true only where the rank of the matrix equals th 2 order. 
In general this is the case and will be assumed in this paper. 

O 

This proposal will be utilized in the forthcoming second edition 
of Cooley and Lohnes. 




4 



multiple regression. We talk about the proportion of variance in the 

criterion accounted for by the predictors, but seldom if ever consider 

the reverse relationship. It is obvious that by reversing our definition 

of criterion and predictor we could develop the index going in the other 

direction. The canonical factors of the predictor set will be FP i and 

similarly FC i for the criterion set. The variables of the predictor 

and criterion sets will be P^^ and respectively. Since the index 

about to be proposed utilizes the concept of a factor extracting a 

proportion of the variance (more appropriately proportion of trace) of 

a set of variables (usually a battery of tests), we will define the 

minim sum of the squared loadings of variables within a set on a 

canonical factor of the set as the variance extracted by that factor. 

When this is divided by the number of variables in the set (M) , the 

resulting value is the proportion of the variance of the set extracted 

by that canonical factor. This will be symbolized as VP. and VC^ The 

squared canonical correlations ^ \ will be written as A^ (following 

Cooley and Lohnes, 1962). This is the proportion of variance in one 

of the ith pair of canonical variates predictable from the other member 

of the pair. If the is multiplied by the A^ the resulting figure 

is the proportion of the variance of the C set explained by correlation 

between FP. and FC.. If this value is calculated for each of the M 
11 c 

pairs of canonical factors, the result is an index of the proportion of 
variance of C predictable from P, or the redundancy in C given P. 



M 



R = 



c 

I A. 



' k VC k = 
k=l k=l 



M 

c 

I A, 



M 

c 
l 

Lj-i 




(where L., is the correlation between the jth 

jk 

variable and kth canonical factor.) 



o 

ERIC 



